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It is suggested that replication projects may be valuable in teaching research methods, and also address the 
current need in psychology for more independent verification of published studies. Their use in an 
undergraduate methods course is described, involving student teams who performed direct replications of 
four well-known experiments, yielding results which were subsequently published online. Illustrative data are 
given for the one successful replication and three failures obtained, and practical suggestions are given for 
incorporating replication projects into a methods course as an alternative to the usual term project. It is also 
noted that the published success rates of replication attempts appear to be higher for those studies that were 
performed as class projects. 
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T HIS PAPER points to a pressing need in 
scientific psychology today for increased 
replication of earlier studies, and suggests 
a practical solution to this problem which also 
carries potential benefits for both students 
and teachers in research methods courses. 

The replication crisis today 

Rosenthal’s (1979) seminal paper first 
pointed out the ‘file drawer problem’: 
journals print papers that are often based on 
Type 1 errors while the file drawers of 
researchers contain unpublished studies of 
the same topics showing null outcomes. 
Many commentators today believe that there 
is a ‘crisis' within the discipline of psychology 
(Laws, 2013; Neuliep, 1990; Pashler & 
Wagenmakers, 2012; Ritchie, Wiseman & 
French, 2012; Yong, 2012). Very few reports 
of replication studies appear in psychology 
journals, despite a growing realisation that 
even major studies appearing in leading 
journals often fail to replicate successfully 
when this is attempted by independent inves¬ 
tigators (Schmidt, 2009). The recent special 
issue of Perspectives on Psychological Science 
confirms the seriousness of this problem 
(Pashler & Harris, 2012), which also occurs 
in other scientific fields, for example, 
cancer research (Begley & Ellis, 2012). 


Only 13 successful replications are 
reported among the 44 attempts that are 
posted currently on the PsychFileDrawer.org 
website, a success rate of 30 per cent overall; 
the targets are 30 prominent articles in 
human psychology. The most common area 
represented is priming, with 10 target studies 
and 18 replication attempts (sometimes 
more than one per target article) that 
yielded four successful replications and 14 
failures, a success rate of only 22 per cent. It 
has also been noted that only about one per 
cent of psychology articles in psychology 
journals clearly involve replication attempts, 
and of these possibly 93 per cent succeed 
when published by the original authors, but 
only 69 per cent when published by new 
authors (Makel, Plucker & Hegarty, 2012). 
However, the pervasive file drawer problem 
means that the true situation is worse, since 
journals in most disciplines, including 
psychology, generally publish only positive 
results (Fanelli, 2011), resulting in the 
‘...promulgation of numerous undead theo¬ 
ries that are ideologically popular but have 
little basis in fact’ (Ferguson & Heene, 2012, 
p.555). 

Students in research methods courses are 
taught that replicability is a cardinal feature 
of the scientific method (e.g. McBurney & 
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White, 2010, pp.208-210), but without 
demonstrable repeatability of findings the 
scientific enterprise is invalidated on logical 
grounds alone. It seems hypocritical to 
present this idea as a guarantee of method¬ 
ological rigour when it is being violated in 
many cases today. Ioanniclis (2012) argues 
that the estimate of 53 per cent for fallacies 
that are perpetuated in the literature which 
is proposed by Pashler and Harris (2012) 
may be too low, and that the true figure 
could reach 95 per cent in certain fields of 
psychology. At the least, numerical analyses 
suggest that a majority of published research 
findings are false (Ioannidis, 2005). 

If a replication crisis exists, then 
constructive solutions are needed. The few 
replications that are published today lead to 
exchanges that are often protracted, incon¬ 
clusive, or abrasive (e.g. Byrne et ah, 1966; 
Chabris et ah, 1999; Dijksterhuis, 2013; 
Doyen et ah, 2012; Shanks et ah, 2013; 
Wagenmakers et ah, 2011). 

Students as a strategic resource 

Many suggestions have been advanced which 
ideally might overcome the replication 
problem (Asendorpf et ah, 2013), for 
example, reward replication research as 
much as original work, require external 
replication of results before publication, or 
require studies to be registered in advance 
with a central registiy. Brandt et ah (2014) 
have elaborated these ideas into a Replica¬ 
tion Recipe with a checklist of 36 questions 
to guarantee a ‘convincing’ replication, but 
the likelihood of its general implementation 
appears minimal, due to reviewer bias 
against replications (Neuliep & Crandall, 
1993), scarce journal space, the low prestige 
attached to replications, time constraints on 
researchers, and the restriction of funding to 
new work. 

Some initiatives are evident here, since 
the online journal BioMed Central Psychology 
has pledged that it will ‘put less emphasis on 
interest levels’, and will publish repeat 
studies and negative results (Laws, 2013). 
Also the Association for Psychological 


Science has announced that its journal 
Perspectives on Psychological Science is now 
running a replication project with a stan¬ 
dardised protocol (Gage, 2013; ‘Registered 
replication reports', 2013), and PsychDisclo- 
sure.org has appeared as a public database 
for recendy published articles, to provide 
additional methodological details (Lebel et 
ah, 2013). Finally, occasional multi-experi¬ 
ment studies are published which exhaus¬ 
tively examine the replicability of a given 
effect (e.g. Shanks et ah, 2013, present nine 
studies, with 475 participants, which all 
failed to replicate the intelligence priming 
effect). 

However, approximately 123,000 new 
entries for peer-reviewed journal articles are 
added each year to PsycINFO, which already 
includes the abstracts of over 2,400,000 arti¬ 
cles (only 32,000 of these, or 1.3 per cent, 
include the stem replicat* in the abstract). 
With so many unverified research findings, 
the most plausible way to reduce the 
problem may be to employ students as 
collaborators, as suggested by Grahe et ah 
(2012), and by Frank and Saxe (2012). As 
these authors argue, this would be good for 
the public scientific accountability of 
psychology, as well as for students and 
teachers. Students may represent an abun¬ 
dant and underutilised resource, assuming 
they may be relied upon to collect valid data, 
and in comparison with this possible 
problem, the likely payoffs appear consider¬ 
able: enhanced scientific integrity of the 
field, a more manageable term project for 
the student, and easier classwork for a 
teacher to set up. Furthermore the new 
online posting forum PsychFileDrawer.org 
provides a simple, cost-efficient way to build 
up over time a public database of replication 
attempts and outcomes. 

Typical problems with the traditional 
independent term project 

Although a few individual student projects 
are published, on rare occasions becoming 
well-known (e.g. Pheterson, Kiesler & Gold¬ 
berg, 1971; ‘This week’s citation classic’, 
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1983), and an individual project might 
appear to be a desirable adjunct to standard 
lab exercises that are set up by the instructor, 
our experience over the years has been that 
most students find it very difficult within a 
single term or semester to meet the require¬ 
ments for a meaningful independent 
project: an original hypothesis based on rele¬ 
vant literature, operationalisation of the key 
idea, preparation of materials, ethics 
approval, pilot testing and data collection, to 
be followed by data analysis and a written 
report. 

For students who are fairly new to 
psychology, with other work to perform in 
the methods course, this is a major chal¬ 
lenge, with time pressure often leading to 
experiments that have been described as 
‘...silly, poorly designed, and unlikely to 
connect to current issues in psychological 
science’ (Frank & Saxe, 2012, p.601). 
Proposed projects may show little theoretical 
understanding, knowledge of the literature, 
or scientific awareness. Too often the exper¬ 
imental hypothesis is unclear, or has been 
created by the teacher, with methodological 
flaws sometimes evident, and samples are 
usually so small that statistical power is low. 
By way of contrast, a two-term honours 
dissertation project involves a manageable 
schedule, generally leading to a much better 
study and possibly a joint publication with 
the supervisor. 

For the above reasons, the senior author 
decided to offer students the option of 
performing a replication project in place of 
an independent term project. (All chose this 
more structured option.) Direct rather than 
conceptual replications were used, since the 
latter are logically subordinate to studies 
which first test for the existence of an effect 
before attempts are made to generalise it. 

Implementing replication projects 

This approach was developed in a three- 
credit advanced research methods class at a 
small university. This course is taken in the 
second year of a three-year psychology 
honours BA programme, and requires prior 


credit for a basic research methods course 
and two statistics courses, all taken in first 
year, with an average grade of at least 75 per 
cent. It is not likely that undergraduates 
without a prior background in methods or 
statistics would be suitable for this approach. 

The instructor and class first chose four 
target articles from the current listing of 
papers given on PsychFileDrawer.org, with 
the results of at least one prior replication 
attempt posted, whether successful or not. 
(This is not essential, but provides a way to 
compare the class results not only with the 
target article but also with other replication 
attempts.) 

To do this, the original papers listed in 
PsychFileDrawer.org were all scrutinised, 
and eliminated as possible targets if they 
involved practical difficulties re apparatus or 
materials, a very specialised or technical 
focus, lengthy or individualised testing 
requirements, or a complicated design. 
Instances were also noted of marginal statis¬ 
tical significance, low effect size, or large 
numbers of subjects (as we wished at least to 
match the N of each target study). When two 
experiments within a given study both 
appeared to be suitable, the one reporting 
the larger effect size was chosen so as to 
maximise statistical power. Each of the 
chosen papers included a multiple set of 
experiments, from which only one was 
selected for replication. The selection stage 
provides a useful opportunity to discuss with 
the class the issues of power, significance, 
and effect size by examining the original 
study and the number of subjects needed for 
the replication, facilitated by computational 
software (e.g. Allen & Hannent, 2013). 

This process occupied about two weeks 
and led to the selection of the four papers 
posted in PsychFileDrawer.org which best 
met the above criteria. Three of these were 
priming studies (Dijksterhuis & van Knip- 
penberg, 1998, Experiment 4; Vohs, Mead & 
Goode, 2006, Experiment 3; Williams & 
Bargh, 2008, Study 3). These studies 
reported that primes related to intelligence, 
money, or distance, respectively, raise cogni- 
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tive performance, reduce helpfulness, and 
cause lower caloric estimates for unhealthy 
foods. Despite recent controversy con¬ 
cerning the validity of some priming experi¬ 
ments (Bartlett, 2013), the selection of these 
three priming studies as targets was fortu¬ 
itous and due solely to the above-mentioned 
criteria. The fourth study selected (Gailliot 
et al., 2007, Study 8) reported that ingesting 
a glucose drink enhances a subject’s self 1 
control when it has been threatened by 
thoughts of mortality. 

Organising the teams 

Although a replication may be performed by 
a pair of lab partners (Frank & Saxe, 2012), 
this would entail heavy time demands on the 
student, so we had the 24 members of the 
class sign up into four teams of six each, to 
spread the time load. This team size proved 
satisfactory, although the final choice is arbi¬ 
trary. 

One member of each team was desig¬ 
nated as a co-ordinator, on the basis of a 
group vote, with responsibilities for 
preparing materials, consulting regularly 
with the instructor, organising team 
members to carry out the testing, and storing 
the data. Co-ordinators were required to test 
only a handful of subjects, for the experi¬ 
ence, so as to equalise the total time load for 
them and their team members. There is no 
need to make all teams the same size, and if 
a given project requires many subjects to 
match the original sample size (or involves a 
lengthy testing procedure, or individual as 
opposed to group testing), the sizes of the 
teams may be adjusted so as to balance out 
the workload per student. Teams were 
encouraged to take personal responsibility 
for their projects. 

Each student next wrote a graded 
proposal of about 10 pages in APA format, 
which they had already encountered in class, 
to ensure that they would be familiar with 
their target study. This proposal included a 
short literature review, details of the method 
and procedure gleaned from the target 


paper, and an outline template for the 
results section, as well as standard forms for 
informed consent and debriefing. The 
instructor handled the paperwork to obtain 
approval from the research ethics board for 
each project. 

The instructor then met with each team, 
and discussed the details of their project to 
clarify details of its methods and procedure, 
for example, if the time allowed for a certain 
test was not specified in the report, a deci¬ 
sion was made as to how to standardise it. 
Every attempt was made to follow the 
specifics of the target article closely, and in 
some cases details were obtained from the 
original authors by email. The numbers of 
subjects ultimately tested in the four studies 
were similar to those in the target papers, 
with respective values as follows (target 
article Ns are given in parentheses): 48 (43), 
58 (73), 40 (39), and 60 (59) participants. 

Testing participants 

The instructor met the co-ordinators individ¬ 
ually to run through a trial testing session, 
taking the role of the subject. It is important 
first to create a standardised written protocol 
for all student testers to follow, once the 
details of materials and apparatus have been 
finalised. Each team thereafter largely ran 
their own project, performing their assigned 
testing in a departmental lab. When pilot 
testing and data collection had been 
completed, each co-ordinator collated their 
team’s data sheets and prepared a composite 
SPSS data file in conjunction with the 
instructor, who cross-checked their statistical 
analysis. This data file was sent to the team 
members for use in writing up their reports, 
which were graded individually. Although 
participation in fact proved to be excellent, 
this was vouchsafed by explaining in the first 
class meeting that each person's contribu¬ 
tion would be rated at the end by the other 
members of their team, using the Peer Eval¬ 
uation Form of Fferreid (2001). No objec¬ 
tions were raised. 
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Results 

With some monitoring by the instructor, the 
experiments all proceeded smoothly, with 
less stress evident over time pressures than is 
usual with term projects, and few logistic 
issues arising. 

At the end of each project, each team 
member submitted a full report in APA 
format, which enlarged upon their proposal 
and included data, analysis, and conclusions. 
The report by each co-ordinator, suitably 
edited, also served as the preliminary basis 
for a composite final report, with their team 
members and the instructor listed as addi¬ 
tional authors, which was posted on the 
public archive of replication attempts main¬ 
tained at PsychFileDrawer.org (Grenier et 
al., 2012; Lane et al., 2012; Roberts et al., 
2012; Sykes et al„ 2012). 

The four projects overall yielded one 
successful replication (of Gailliot et al., 
2007), which showed almost the same effect 
size as the original study, and three failures 
(involving the three priming studies). Two of 


these three failures yielded non-significant 
differences in the opposite direction to the 
target article, and the composite effect size, 
averaged over the four experiments, was 
effectively zero. The data, given in Table 1, 
are quite typical for PsychFileDrawer.org, 
which lists six other failures to replicate 
these three priming studies, and no 
successes. 

It may be noted that even though we 
were able to replicate Study 8 of the study by 
Gaillot et al., PsychFileDrawer.com lists an 
unsuccessful attempt at replicating Study 7 
of that paper (Cesario & Corker, 2010), and 
the theory of glucose as an aid to self-control 
has elsewhere been challenged (Kurzban, 
2011; Molden et al., 2012). Conversely, 
although we failed to replicate Vohs et al. 
(2006), other studies have found a link 
between money-priming and reduced help¬ 
fulness (e.g. Chatteijee, Rose & Sinha, 2013, 
Gasiorowska & Helka, 2012; Roberts & 
Roberts, 2012). A failure to replicate a given 
study does not prove that it was faulty. 


Table 1: Comparison of target study and replication results. 


Target 

study 

Target study finding 

Target 

study 

result 

Replication 

result 

Did the 
replication 
succeed? 

d 

P 

d 

P 


1 

Verbal priming raises intelligence 

0.85 

<.02 

-0.29 

.84 

No 

2 

Glucose aids self-control processes 

.65 

.05 

0.30 

.03 

Yes 

3 

Money priming reduces helpfulness 

0.66 

<.05 

0.08 

.42 

No 

4 

Distance prime alters calorie estimates 

0.90 

.04 

-0.023 

.70 

No 


Mean d= 

0.69 

0.017 



Note: d=effect size 

Target study 1 =Dijksterh u is ft van Knippenberg (1998), Experiments 
Target study 2=Gailliot et al. (2007), Study 8. 

Target study 3=Vohs, Mead ft Goode (2006), Experiment 3. 

Target study 4=Williams Et Bargh (2008), Study 3. 
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Conclusions and discussion 

We believe that the approach outlined here 
can bridge the gap between the classroom 
and the world of psychological research, 
yielding much-needed scientific information 
as well as meaningful project work that 
incorporates diverse active/reflective and 
abstract/concrete aspects of learning (Kolb, 
1984). Students encountered replication as a 
requirement for scientific progress, exam¬ 
ined issues of power and effect size, and 
engaged in some critical thinking. They also 
improved their report writing, computing, 
and statistical skills, and learned to work 
collaboratively. In comparison with previous 
years, it was easier to organise the class work, 
and students were less anxious over their 
projects, working in teams within a pre-estab¬ 
lished template. This team interaction may 
help with the problem of statistics-anxiety 
(Williams, 2010). They showed good motiva¬ 
tion, contributing suggestions and raising 
points for clarification, and seemed pleased 
when they finally achieved an online publi¬ 
cation. 

A further benefit of this approach is that 
it involves many fewer different projects, 
using pre-established methodology, which 
may facilitate early approval from an ethics 
board, whereas with individual projects a 
month’s delay can occur, seriously 
hampering the collection of data. 

The ultimate benefit of replication 
studies hopefully may spread throughout the 
field of psychology as the reliability of major 
studies is clarified through further internet 
postings. This approach appears to produce 
a win-win-win situation. We encountered no 
real problems, and can recommend this idea 
for adoption by instructors who are teaching 
a methods or lab course with students who 
already know some basic statistics and 
methodology. 

The responsibility for writing the 
proposal and the final report may be 
assigned to individuals, pairs of lab partners, 
or teams. The basic framework is flexible, 
and could be adapted so that, for example, 
some teams independently attempt to repli¬ 


cate the same target article, to explore a 
further dimension of replicability. Or 
different teams could attempt to repeat the 
various experiments comprising a given 
paper. Repeating a replication attempt in 
successive years would provide another 
index of reliability. In the case of a between- 
subjects design, extra groups can be added 
to create a replication-extension study where 
both an exact and a conceptual replication 
are attempted concurrently, to test the relia¬ 
bility of the original finding and explore 
moderator effects which might explain it 
(Bonett, 2012). This approach might be suit¬ 
able for an honours dissertation (e.g. Carlin 
& Standing, 2013), or a graduate thesis. 
Some studies are beyond the capacities of 
undergraduate students, but may be suited 
to graduate students (Frank & Saxe, 2012). 

It has been argued that failures to repli¬ 
cate occur because psychology students do 
not have the necessary methodological skills 
to perform valid studies (Dijksterhuis, 2013), 
and if there is any increased noise in a system 
due to inexperienced testers then this might 
seem more likely to produce a Type II than a 
Type I error. But the evidence to date 
suggests otherwise. Fourteen of the 44 
studies listed by PsychFileDrawer.com 
(at 6 January 2014) are listed as having been 
performed as class projects. These class pro¬ 
jects actually showed a higher probability of 
achieving replication (seven successes in 14 
attempts), than the remaining studies (6 
successes out of 30). This difference is reli¬ 
able (Fisher’s exact test, /r=.049; Cramer’s V, 
p=. 042), a result which directly contradicts 
Dijksterhuis’s hypothesis, perhaps because 
the file drawer effect operates more strongly 
when student investigators are involved. 

We suggest that for simple studies, 
performed with faculty monitoring, a satis¬ 
factory level of reliability can probably be 
achieved. We note that the target papers for 
replication are themselves often based on 
data that were collected by students under 
supervision. Should this view seem too opti¬ 
mistic, then a control test would resolve the 
issue. If it is possible to identify some experi- 
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mental effect that unquestionably does exist, 
then we could determine how many out of 
100 student experimenters succeed in repro¬ 
ducing it. A validity check of this type would 
complement the current approach and assess 
whether serious measurement error is intro¬ 
duced by the use of student investigators. 
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